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Abstract 

We consider three aspects of avoiding large squares in infinite binary words. First, 
we construct an infinite binary word avoiding both cubes xxx and squares yy with 
\y\ > 4; our construction is somewhat simpler than the original construction of Dekking. 
Second, we construct an infinite binary word avoiding all squares except 2 , l 2 , and 
(01) 2 ; our construction is somewhat simpler than the original construction of Fraenkel 
and Simpson. In both cases, we also show how to modify our construction to obtain 
exponentially many words of length n with the given avoidance properties. Finally, 
we answer an open question of Prodinger and Urbanek from 1979 by demonstrating 
the existence of two infinite binary words, each avoiding arbitrarily large squares, such 
that their perfect shuffle has arbitrarily large squares. 

1 Introduction 

A square is a nonempty word of the form xx, as in the English word murmur. It is easy to 
see that every word of length > 4 constructed from the symbols and 1 contains a square, 
so it is impossible to avoid squares in infinite binary words. However, in 1974, Entringer, 
Jackson, and Schatz |3] proved the surprising fact that there exists an infinite binary word 
containing no squares xx with \x\ > 3. Further, the bound 3 is best possible. 

A cube is a nonempty word of the form xxx, as in the English sort-of-word shshsh. 
Dekking 2\ showed that there exists an infinite binary word that contains no cubes xxx and 
no squares yy with \y\ > 4. Furthermore, the bound 4 is best possible. 
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Dekking's construction used iterated morphisms. By a morphism we understand a map 
h : X* — > A* such that h(xy) = h(x)h(y) for all x, y G £*. A morphism may be specified by 
providing the image words h(a) for all a G S. If h : X* — > £* and /i(a) = ax for some letter 
a G X, then we say that h is prolongable on a, and we can then iterate h infinitely often to 
get the fixed point h?(a) := ax h(x) h 2 (x) h 3 (x) 

A morphism is k-uniform if \h(a)\ = k for all a G S; it is uniform if it is A;-uniform for 
some k. Uniform morphisms have particularly nice properties. For example, the class of 
words generated by iterating /c-uniform morphisms coincides with the class of /c-automatic 
sequences, generated by finite automata [T|. 

Dekking's construction used a non-uniform morphism. In this paper we first show how 
to obtain, using the image of a uniform morphism, an infinite binary word that is cubefree 
and avoids squares yy with \y\ > 4. Our construction is somewhat simpler than Dekking's. 

Fraenkel and Simpson jlj strengthened the results of Entringer, Jackson, and Schatz by 
showing that there exists an infinite binary word avoiding all squares except 2 , l 2 , and 
(01) 2 . Their construction, however, was rather complicated, involving several steps and non- 
uniform morphisms. In this paper we show how to obtain a word where the only squares are 
2 , l 2 , and (01) 2 , using a uniform morphism. Our construction is somewhat simpler than 
that of Fraenkel and Simpson. 

We also consider the number of finite binary words satisfying the Dekking and Fraenkel- 
Simpson avoidance properties. We give exponential upper and lower bounds on this number 
in both cases. 

Prodinger and Urbanek p] also studied words avoiding large squares, in particular with 
reference to operations that preserve this property, such as the perfect shuffle III . Let 
w = a\a 2 ■ ■ ■ a n and x = 61 6 2 ■ ■ ■ b n be words of length n. The perfect shuffle w III x is defined 
to be the word 01610262 • ■ • a n b n of length In. The definition can easily be extended to infinite 
words. They stated the following open question: do there exist two infinite words avoiding 
large squares such that their perfect shuffle has arbitrarily large squares? In this paper we 
resolve this question by exhibiting an example. 

2 A cubefree word without arbitrarily long squares 

In this section we construct an infinite cubefree binary word avoiding squares yy with \y\ > 4. 
The techniques we use are also used in later sections, so in this section we spell them out in 
some detail. 

We introduce the following notation for alphabets: := {0, 1, . . . , k — 1}. 

Theorem 1 There is a squarefree infinite word over £4 with no occurrences of the subwords 
12, 13, 21, 32, 231, or 10302. 
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Proof. Let the morphism h be defined by 

-> 0310201023 

1 -> 0310230102 

2 -> 0201031023 

3 -> 0203010201 

Then we claim the fixed point h"(0) has the desired properties. 

First, we claim that if w G £4 then h(w) has no occurrences of 12, 13, 21, 32, 231, or 
10302. For if any of these words occur as subwords of h(w), they must occur within some 
h(a) or straddling the boundary between h(a) and h(b), for some single letters a, b. They do 
not; this easy verification is left to the reader. 

Next, we prove that if w is any squarefree word over £ 4 having no occurrences of 12, 13, 
21, or 32, then h(w) is squarefree. 

We argue by contradiction. Let w = a\a 2 ■ ■ ■ a n be a squarefree string such that h(w) 
contains a square, i.e., h(w) = xyyz for some x, z G £|, y G £4 . Without loss of generality, 
assume that to is a shortest such string, so that < \x\, \z\ < 10. 

Case 1: \y\ < 20. In this case we can take \w\ < 5. To verify that h(w) is squarefree, 
it therefore suffices to check each of the 49 possible words w G £4 to ensure that h(w) is 
squarefree in each case. 

Case 2: |y| > 20. First, we establish the following result. 

Lemma 2 (a) Suppose h(ab) = th(c)u for some letters a, b, c G E 4 and strings t, u G 

T/ien i/iis inclusion is trivial (that is, t = e or u = e) or u is not a prefix of h(d) for 
any d G £4. 

(7>j Suppose there exist letters a,b,c and strings s,t,u,v such that h(a) = st, h(b) = uv, 
and h(c) = sv. Then either a = c or b = c. 

Proof. 

(a) This can be verified with a short computation. In fact, the only a, b, c for which 
the equality h(ab) = th(c)u holds nontrivially is h(31) = th(2)u, and in this case 
t = 020301, u = 0102, so u is not a prefix of any h(d). 

(b) This can also be verified with a short computation. If \s\ > 6, then no two distinct 
letters have images under h that share a prefix of length 6. If \s\ < 5, then \t\ > 5, and 
no two distinct letters have images under h that share a suffix of length 5. 

■ 

Once Lemma 121 is established, the rest of the argument is fairly standard. It can be found, 
for example, in [Sj, but for completeness we repeat it here. 

For i = 1,2, ... ,n define A { = h(a,i). Then if h(w) = xyyz, we can write 

h(w) = A,A 2 ■■■A n = A[A>;A 2 ■ ■ ■ Aj-iAjAjAj+x ■ ■ ■ A n _ x X n A" n 
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where 



At = A\A'[ 

A J = A j A j 

A — A' A" 

X = 

y = A'[A 2 ■ ■ ■ Aj^A'j = A" 3 A j+1 ■ ■ ■ A n ^A' n 

z = A n , 



where \A1\, \A!}\ > 0. See Figured 
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Figure 1: The string xyyz within h(w) 



If \A"\ > \A"\, then Aj +1 = h(aj + i) is a subword of A"A 2 , hence a subword of A\A 2 
h{a\a 2 ). Thus we can write Aj +2 



A' j+2 A" 3+2 with 



See Figure El 
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Figure 2: The case \A'{\ > \A'j\ 

But then, by Lemma El (a), either \A"\ = 0, or \A"\ = \A"\, or A'j +2 is a not a prefix of 
any h(d). All three conclusions are impossible. 

If \A"\ < \A"\, then A 2 = h(a 2 ) is a subword of A"Aj + i, hence a subword of A,A, +1 = 
h(ajdj + i). Thus we can write A 3 = A' 3 A% with 

A'IA 2 A' 3 = A]A i+x . 

See Figure El 
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By Lemma El (a), either \A"\ = or \A" 
all three conclusions are impossible. 

Therefore \A'{\ = \A"\. Hence A" -- 
Since /i is injective, we have a 2 = a J+ i 



Figure 3: The case |^'| < 

or A' 3 is not a prefix of any h(d). Again 
A n _i, and A' 



. . , a,_i 



-4, 



, Aj-i 



A' 



A>A» 



by 10 and Aj 
In the first case, a 2 ■ 
a contradiction. In the second case, a\ ■ 
(ax ■ ■ ■ aj_i) 2 , a contradiction. 

It now follows that the infinite word 



a n _i- It also follows that |y| is divisible 

2 



A' n A'[. But by Lemma El (b), either (1) aj = a n or (2) cij 
_iOj = aj+i ■ ■ • a n _ia n , so w contains the square (a 2 ■ • • 



a,a, + i ■ • • a„-i, so w contains the square 



ft w (0) = 03102010230203010201031023010203102010230201031023 • • • 
is squarefree and contains no occurrences of 12, 13, 21, 32, 231, or 10302. ■ 

Theorem 3 Let w be any infinite word satisfying the conditions of Theorem^ Define a 
morphism g by 




1 

2 
3 



010011 
010110 
011001 
011010 



Then g(w) is a cubefree word containing no squares xx with \x\ > 4. 

Before we begin the proof, we remark that all the words 12, 13, 21, 32, 231, 10302 must 
indeed be avoided, because 

0(12) contains the squares (0110) 2 , (1100) 2 , (1001) 2 

#(13) contains the square (0110) 2 

#(21) contains the cube (01) 3 

#(32) contains the square (1001) 2 

#(231) contains the square (100101 10) 2 

#(10302) contains the square (1001001101 10) 2 . 

Proof. The proof parallels the proof of Theorem ^ Let w = aia 2 • • ■ a n be a squarefree 
string, with no occurrences of 12, 13, 21, 32, 231, or 10302. We first establish that if 



5 



g{w) = xyyz for some x, z G £4, y G £4 , then |y| < 3. Without loss of generality, assume w 
is a shortest such string, so0<|x|,|z|<6. 

Case 1: \y\ < 12. In this case we can take \w\ < 5. To verify that g(w) contains no 
squares yy with \y\ > 4, it suffices to check each of the 41 possible words «; G £|. 

Case 2: |y| > 12. First, we establish the analogue of Lemma El 

Lemma 4 (a) Suppose g(ab) = tg(c)u for some letters a, b, c G £4 and strings t,u G £4. 
ITien t/iis inclusion is trivial (that is, t = e or u = e) or u is not a prefix of g(d) for 
any d G £4. 

(b) Suppose there exist letters a,b,c and strings s,t,u,v such that g(a) = st, g(b) = uv, 
and g(c) = sv. Then either a = c or b = c, or a = 2, b = 1, c = 3, s = 0110, t = 01, 
u = 0101, v = 10. 



(a) This can be verified with a short computation. The only a, b, c for which g(ab) = tg(c)u 
holds nontrivially are 



But none of 110, 0011, 10 are prefixes of any g(d). 

(b) If \s\ > 5 then no two distinct letters have images under g that share a prefix of length 
5. If \s\ < 3 then \t\ > 3, and no two distinct letters share a suffix of length 3. Hence 
\s\ = 4, \t\ = 2. But only g(2) and g(3) share a prefix of length 4, and only g(l) and 
g(3) share a suffix of length 2. 

■ 

The rest of the proof is exactly parallel to the proof of Theorem ^ with the following 
exception. When we get to the final case, where \y\ is divisible by 6, we can use Lemma 0] 
to rule out every case except where x = 0101, z = 01, ai = 1, a,j = 3, and a n = 2. Thus 
w = \a3a2 for some string a G £4. This special case is ruled out by the following lemma: 

Lemma 5 Suppose a G £4, and let w = Ia3a2. Then either w contains a square, or w 
contains an occurrence of one of the subwords 12, 13, 21, 32, 231, or 10302. 

Proof. This can be verified by checking (a) all strings w with \w\ < 4, and (b) all strings 
of the form w = abcw'de, where a, b, c,d,e G £4 and w' G £4. (Here w' may be treated as 
an indeterminate.) ■ 



Proof. 



0(01) 
0(10) 

0(23) 



010 0(3) 110 
01 g(2) 0011 
0110 g(l) 10. 
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It now remains to show that if w is squarefree and contains no occurrence of 12, 13, 21, 
32, 231, or 10302, then g(w) is cubefree. If g(w) contains a cube yyy, then it contains a 
square yy, and from what precedes we know \y\ < 3. It therefore suffices to show that g(w) 
contains no occurrence of 3 , l 3 , (01) 3 , (10) 3 , (001) 3 , (010) 3 , (Oil) 3 , (100) 3 , (101) 3 , (110) 3 . 
The longest such string is of length 9, so it suffices to examine the 16 possibilities for g(w) 
where \w\ = 3. This is left to the reader. 

The proof of Theorem |3] is now complete. ■ 

Corollary 6 If g and h are defined as above, then 

g(h u {0)) = 010011011010010110010011011001010011010110010011011001011010 • • • 

is cubefree, and avoids all squares xx with \x\ > 4. 

Next, based on the morphism h, we define the substitution hi : £4 — ► 2 s * as follows: 

-> {h{0)} 

1 -> {h(l), 0310230201} 

2 -> {h(2)} 

3 -> {h(3)} 

Thus, if w 6 £4, h'(w) is a language of 2 r words over S 4 , where r = \w\i. Each of these 
words is of length 10|iu|. 

Lemma 7 Let g, h, and hi be defined as above. Let w = h m (0) for some positive integer m. 
Then g(h'(w)) is a language of2 n ' 300 words over S 2 , where n = 60 • 10 m is the length of each 
of these words. Furthermore, each of these words is cubefree and avoids all squares xx with 
\x\ > 4. 

Proof. Note that there are exactly two l's in every image word of h. Hence, \w\i = = 
I ■ 10 m . We have then that g(h'(w)) consists of 2^' 1Qm binary words. Since n — 6 • 10 • 10 m , 
we see that g{h'{w)) consists of 2 n / 300 words. 

To see that the words in g(h'(w)) are cubefree and avoid all squares xx with \x\ > 4, 
it suffices by Theorem to show that the words in h'(w) are squarefree and contain no 
occurrences of the subwords 12, 13, 21, 32, 231, or 10302. By the same reasoning as in 
Theorem ^ the reader may easily verify that no word in h'(w) contains an occurrence of 12, 
13, 21, 32, 231, or 10302. 

To show that the words in h'(w) are squarefree, we will, as a notational convenience, 
prefer to consider hi to be a morphism defined as follows: 

h(0) 

1 - Ml) 

1 -> 0310230201 

2 -> h(2) 

3 -> h(3) 
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Here, 1 and 1 are considered to be the same alphabet symbol; the 'hat' simply serves to dis- 
tinguish between which choice is made for the substitution. To show that h'(w) is squarefree, 
it suffices to show that h! satisfies the conditions of Lemma 121 For Lemma 01 (a) we have 
h'{22) = th'(l)u, but we can rule this case out since w avoids the square 22. For Lemma|2(b) 
we again have that no two distinct letters have images under h! that share a prefix of length 
6 or a suffix of length 5 (since 1 and 1 are not considered to be distinct letters). Hence, h! 
satisfies the conditions of Lemma |2l and so h'(w) is squarefree. ■ 

Theorem 8 Let G n denote the number of cubefree binary words of length n that avoid all 
squares xx with \x\ > 4. Then G n = fi(1.002 n ) and G n = 0(1.178 n ). 

Proof. Noting that 2 1 / 300 = 1.002, we see that the lower bound follows immediately from 
Lemma 

For the upper bound we reason as follows. The set of binary words of length n avoiding 
cubes and squares xx with \x\ > 4 is a a subset of the set of binary words avoiding 000 and 
111. The number G' n of binary words avoiding 000 and 111 satisfies the linear recurrence 
G' n = G' n _i + G' n _ 2 for n > 3. From well-known properties of linear recurrences, it follows 
that G' n = 0(a n ), where a is the largest zero of x 2 — x — 1, the characteristic polynomial of 
the recurrence. Here a < 1.618, so G' n = 0(1.618"). 

This argument can be extended by using a symbolic algebra package such as Maple. 
Noonan and Zeilberger |Sj have written a Maple package, DAVID_IAN, that allows one to 
specify a list L of forbidden words, and computes the generating function enumerating words 
avoiding members of L. We used the package for a list L of 90 words of length < 20: 

000, 111, . . . , 11011001001101100100 

obtaining a characteristic polynomial of degree 44 with dominant root = 1.178. ■ 

The following table gives the number G n of binary words of length n avoiding both cubes 
xxx and squares y with \y\ > 4. 



n 





1 2 


3 


4 


5 


6 


7 8 


9 


10 


11 


12 


13 


14 


15 


16 


17 


G n 


1 


2 4 


6 


10 


16 


24 


36 52 


72 


90 


116 


142 


178 


220 


264 


332 


414 



3 A uniform version of Fraenkel- Simpson 

In this section we construct an infinite binary word avoiding all squares except 2 , l 2 , and 
(01) 2 . 

Roughly speaking, verifying that the image of a morphism avoids arbitrarily large squares 
breaks up into two parts: checking a finite number of "small" squares, and checking an infinite 
number of "large" squares. The small squares can be checked by brute force, while for the 
large squares we need a version of Lemma El Referring to Lemma El (a), if h(c) is a subword 
of h(ab) for some letters a, b, c, we call this an "inclusion". Inclusions can be ruled out either 
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by considering prefixes, as we did in Lemma |21 (a), or suffixes. Referring to Lemma 121(b), if 
h(a) = st, h{b) = uv, and h(c) = sv, we call that an "interchange" . 

The basic idea of the proofs in this section parallels that of the previous section, so we 
just sketch the basic ideas, pointing out the properties of the inclusions and interchanges. 

Consider the 24-uniform morphism h defined as follows: 

-> 012321012340121012321234 

1 -»■ 012101234323401234321234 

2 -> 012101232123401232101234 

3 012321234323401232101234 

4 -> 012321234012101234321234 

Theorem 9 If w G S5 zs squarefree and avoids the patterns 02,03,04,14,20,30,41, i/ien 
h(w) is squarefree and avoids the patterns 02, 03, 04, 13, 14, 20, 24, 30, 31, 41, 42, 010, 434. 

Proof. The only inclusion is h(32) = 0123212343234 h(0) 01232101234, and 0123212343234 
is not a suffix of the image of any letter. 

There are no interchanges for this morphism. ■ 

Now consider the 6-uniform morphism 



9(0) 


= 011100 


9(1) 


= 101100 


9(2) 


= 111000 


9(3) 


= 110010 


9(4) 


= 110001 



Theorem 10 Ifw is squarefree and avoids the patterns 02, 03, 04, 13, 14, 20, 24, 30, 31, 41, 42, 434010 
then the only squares in g(w) are 00, 11,0101. 

Proof. 

There are no examples of interchanges for g. 

There are multiple examples of inclusions, but many of them can be ruled out by prop- 
erties of w and g: 

• #(02) = 011100(0)0 but 02 cannot occur 

• #(24) = 10(4)10001 but 24 cannot occur 

• #(12) = 101100(0)0 but 10110 is not a suffix of any g(a) 

• #(32) = 110010(0)0 but 11001 is not a suffix of any #(a) 
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• 0(21) = 10(4)01100 but 01100 is not a prefix of any g(a) 

• 0(23) = 1^(4)10010 but 10010 is not a prefix of any g(a) 

Since 0(434010) = 1100(01110010110001) 2 1100, we need a special argument to rule this 
out. There are four special cases that must be handled: 

• 0(43) = 11000(0)10 

• 0(34) = 11000(1)01 

• 0(01) = 010(3)1100 

• 0(10) = 100(4)1100 

In the first example, 0(43) = 11000(0)10, since 10 is only a prefix of 0(1), we can extend 
on the right to get 0(43)1100 = 11000(01). But since 1100 is only a prefix of 0(3) or 0(4), 
this gives either the forbidden pattern 33 or the forbidden pattern 434. 

In the second example, 0(34) = 11000(1)01, since 01 is only a prefix of 0(0), we can 
extend on the right to get 0(34)1100 = 11000(10). But 1100 is a suffix of only 0(0) and 0(1), 
so on the right we get either the forbidden pattern 010 or the forbidden pattern 11. 

The other two cases are handled similarly. ■ 

As in the previous section, we now define the substitution h! : Eg — > 2 s s as follows: 

-> {h(0), 012101232123401234321234} 

1 -> {h(l)} 

2 -> {h{2)} 

3 -> {h(3)} 

4 -> {/»(4)} 

Thus, if w G Eg, h'(w) is a language of T words over E 5 , where r = |u>| . Each of these 
words is of length 24|w|. 

Lemma 11 Let g, h, and h! be defined as above. Let w = h m (0) for some positive integer 
m. Then g{h!{wj) is a language of2 n ^ 1152 words over E2, where n = 144 • 24 m is the length 
of each of these words. Furthermore, these words avoid all squares except 2 , I 2 , and (01) 2 . 

Proof. The proof is analogous to that of Lemma [H Note that there are at least three 0's 
in every image word of h. Hence, \w\q > \\w\ = | -24 m . We have then that g{h'{w)) consists 
of at least 2i' 24 ™ binary words. Since n = 6 ■ 24 ■ 24 m , we see that g{h'{w)) consists of at 
least 2™/ 1152 words. 

To see that the words in g(h'(w)) avoid all squares except 2 , l 2 , and (01) 2 it suffices by 
Theorem ITU1 to show that the words in h'(w) are squarefree and contain no occurrences of 
the subwords 02, 03, 04, 13, 14, 20, 24, 30, 31, 41, 42, or 434010. The reader may easily 
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verify that the words in h'(w) contain no occurrences of the subwords 02, 03, 04, 13, 14, 20, 
24, 30, 31, 41, 42, or 434010. 

To show that the words in h'(w) are squarefree, we will, as before, consider h! to be a 
morphism defined as follows: 

-> h(0) 

6 -> 012101232123401234321234 

1 - h(l) 

2 -> h(2) 

3 -> /i(3) 

4 -> /i(4) 

There are no inclusions for /i' other than the one identified in the proof of Theorem There 
are three interchanges: referring to Lemma|2](b), we have that (a, 6, c) G {(2, 1,0), (2, 4, 0), (0, 3, 2)} 
satisfies h'(a) = st, h'{b) = uv, and h'(c) = sv. We may rule out the first two cases by show- 
ing that w avoids all subwords of the form Ia0a2 and 4a0a2, where a 6 Eg. Note that in 
the word w, any occurrence of must be followed by a 1, since w avoids the patterns 02, 03, 
and 04. Let i be a subword of w of the form Ia0a2 or 4a0a2. Then x must begin with 11 
or 41. This is a contradiction, as w avoids both 11 and 41. 

We may rule out the third case by showing that w avoids all subwords of the form 3a2a0, 
where a £ Eg. Note that in the word w, any occurrence of 2 must be followed by either 1 or 
3, since w avoids the patterns 20 and 24. Let x be a subword of w of the form 3a2a0. Then 
x must begin with 31 or 33. This is a contradiction, as w avoids both 31 and 33. ■ 

Theorem 12 Let H n denote the number of binary words of length n that avoid all squares 
except 2 , I 2 , and (01) 2 . Then H n = fi(1.0006 n ) and H n = 0(1.135"). 

Proof. The proof is analogous to that of Theorem El Noting that 2 1 / 1152 = 1.00 06, wc sec 
that the lower bound follows immediately from Lemma ITT1 

For the upper bound, we again used the DAVID_IAN Maple package for a list of 65 words 
of length < 20: 

0000, 1010, . . . , 1110001011100010 

obtaining a characteristic polynomial of degree 58 with dominant root = 1.135. ■ 

The following table gives the number H n of binary words of length n containing only the 
squares 2 , l 2 , and (01) 2 . 



n 





1 


2 3 


4 


5 


6 


7 8 


9 10 


11 


12 


13 


14 


15 


16 


17 


18 


H n 


1 


2 


4 8 


13 


22 


31 


46 58 


78 99 


124 


144 


176 


198 


234 


262 


300 


351 
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4 The Prodinger-Urbanek problem 



Prodinger and Urbanek |7| stated they were unable to find an example of two infinite binary 
words avoiding large squares such that their perfect shuffle had arbitrarily large squares. In 
this section we give an example of such words. 

Theorem 13 There exist two infinite binary words x and y such that neither x nor y 
contain a square ww with \w\ > A, but xllly contains arbitrarily large squares. 

Proof. Consider the morphism / : £2 — > £3 defined as follows: 

/(0) = 001 

/(i) = no. 

We will show that / w (0) = 001001110001001110110110001 ■• • contains arbitrarily large 
squares and is the perfect shuffle of two words, each avoiding squares ww with \w\ > 4. 

First, we define the morphisms h : £4 — > £4, g\ : £4 — > £2, and g 2 : £4 — > £2 defined as 
follows: 



h(0) = 012 

h(l) = 302 

h{2) = 031 

h(3) = 321, 



<7r(0) 


= 001 




02 (0) 


= 010 


^i(l) 


= 101 


and 


02(1) 


= 100 


^i(2) 


= 010 




02(2) 


= 011 


5i(3) 


= 110, 




02(3) 


= 101 



We now show 

Lemma 14 / w (0) = g 2 (h u {0))m.g 1 {h u {Q)). 

Proof. We prove the following identities by induction on n. 

/ n+1 (00) = g 2 (h n (0))UI gi (h n (0)) (1) 

T +1 (10) = g 2 (h n (l))UI 9l (h n (l)) (2) 

r +1 (01) = g 2 (h n (2))UI gi (h n (2)) (3) 

r +1 (ll) = g 2 (h n (3))UI gi (h n (3)) (4) 

It is easy to verify that these equations hold for n = 0. We assume that they hold for n = k, 
where k > 0, and show that they hold for n — k + 1. We first consider / fc+2 (00), where we 
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have 



/ fc+2 (00) = / fc+1 (001001) 

= / fe+1 (00)/ fc+1 (10)/ fe+1 (01) 

= (g 2 (h k (0))UI gi (h k (0))) (g 2 (h k (l)) m 9l (h k (l))) (g 2 (h k (2))m 9l (h k (2))) 

= {g 2 (h k (0))g 2 (h k (l))g 2 (h k (2))) HI { gi (h k (0)) gi (h k (l)) gi (h k (2))) 

= g 2 {h k {0)h k {l)h k {2)) HI 9l (h k (0)h k (l)h k (2)) 

= g 2 (h k (012)) VI 9l (h k (012)) 

= g 2 (h k+1 (0))Ul gi (h k+1 (0)) 

as desired. The other cases of the induction for / fc+2 (10), / fc+2 (01), and J fc+2 (11) follow 
similarly. The result now follows from ■ 

We now prove 
Lemma 15 The infinite word /i w (0) is squarefree. 

Proof. This follows immediately by the analogue of Lemma El An easy computation 
shows there are no inclusions or interchanges for h. ■ 

We now define 

A = {010, 013, 021, 030, 032, 102, 121, 131, 202, 212, 231, 301, 303, 312, 320, 323}. 

Lemma 16 (a) h w (0) contains no subwords x where x £ A; and 

(b) /i w (0) contains no subwords of the form 0cda:3 ; Ia0a2 ; 2a3al ; or 3a2a0, where a £ 
£4. 

Proof. 

(a) This can be verified by inspection. 

(b) We argue by contradiction. Let w be a shortest subword of /i"(0) such that w is of the 
form 0alo;3, Ia0a2, 2a3al, or 3a2a0. Suppose w is of the form 0ctla;3. Note that 
the only image words of h that contain the letter 1 are h(0) = 012, h(2) = 031, and 
h(3) = 321. Hence it must be the case that a is of the form 2o/0, a/03, or a'32 for 
some a' £ £4. We therefore have three cases. 

Case 1: w — 02o/012a/03 for some a' £ £4. We have two subcases. 

Case l.i: \w\ < 12. A short computation suffices to verify that, contrary to (a), all 
words w of the form 02o/012o/03 with \w\ < 12 contain a subword x where 
x £ A. 
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Case l.ii: \w\ > 12. We first make the observation that any image word of h is 
uniquely specified by its first two letters and also by its last two letters. 
Thus, if w = 02a'012a'03, it must be the case that 3wl = 302a'012a'031 = 
h(l)a'h(0)a'h(2) is also a subword of /^(O). Furthermore, since h has no 
inclusions, the infinite word /^(O) can be uniquely parsed into image words 
of h. Since we have that h(l)a'h(0)a'h(2) is a subword of /i w (0), this im- 
plies that \a'\ is a multiple of 3 and that h(l)a'h(0)a'h(2) = ft,(l/?0/32) for 
some (3 G £4, \(3\ < \a'\. So 1/30/32 must also be a subword of /i w (0). This 
contradicts the minimality of w. 

Case 2: w = 0a;'031a:'033 for some a' 6 S4. But then w contains the square 33, contrary 
to Lemma ED 

Case 3: w = 0a'321a'323 for some a' G £4. But by (a) w cannot contain the subword 
323. 

The cases where w is of the form Ia0a2, 2a3al, or 3a0a2 follow similarly. 

■ 

We now give the analogue of Lemma 121 for gi and gi- Let gi represent either gi or g 2 . 
Then we have 

Lemma 17 (a) Suppose gi(ab) = tgi(c)u for some letters a, b, c G £4 and words t,u G ££. 
Then at least one of the following holds: 

(i) this inclusion is trivial (that is, t = e or u = e); 
(ii) u is not a prefix of g^d) for any d G £4; 
(Hi) t is not a suffix of gi{d) for any d G £4; or 

(iv) for allv,w G ££ and alle,e' G £4, if vgi(ab)w = gi(ece'), then at least one of the 
following holds: 

(A) this inclusion is trivial (that is, v — e or w = e); 

(B) w is not a prefix of gi(d) for any d G £4; 

(C) v is not a suffix of gi(d) for any d G £4; 

(D) either e = c or e' = c; 

(E) ece' G A; 

(F) for all x, y G £2 and all k G £4, if gi(kab)x = ygi(ece'), then k = a; or 

(G) for all x, y G £2 and all k G £4, if xgi(abk) = gi(ece')y, then k = b. 

(b) Suppose there exist letters a,b,c G £4 and words s,t,u,v G £?i such that gi(a) = st, 
giib) = uv, gi(c) = sv, and bacaa is a subword of h w {Qi) for some a G £4. Then either 
a = c or b = c. 
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Proof. 

(a) We give one example of each case and list the other non-trivial cases in a table below. 

(i) Trivial. 

(ii) #2(32) = 1(72(0)11, but 11 is not a prefix of g2{d) for any d £ £4. 

(iii) (7i(02) = 00(7i(l)0, but 00 is not a suffix of gi(d) for any d £ £4. 

(iv) (A) Trivial. 

(B) cfe(Ol) = 01^(0)0 and 1^(01)11 = 3 2 (302), but 11 is not a prefix of g 2 {d) for 
any d £ £4. 

(C) #i(31) = l0i(l)Ol and 00^(31)0 = 3i(012), but 00 is not a suffix of 31(d) for 
any d £ £4. 

(D) ^(23) = 0pi(l)10 and 01^(23)1 = ^(211), but e' = c = 1. 

(E) #i(21) = 0£/i(l)01 and 01pi(21)0 = 3i(212), but ece' = 212 £ A. 

(F) g 2 (30) = l£/2(0)10, 01c/ 2 (30)0 = 32(201), and 3 2 (330)0 = 1^(201), but k = 
a = 3. 

(G) 3i(12) = 103i(l)0, 03i(12)01 = 3i(210), and 0^(122) = 3i(210)0, but k = 
b = 2. 



Case 


gi(ab) = tgi(c)u 


vgi(ab)w = gi(ece') 


g,i{kab)x = yg,i{ece') or 
xgi(abk) = gi(ece')y 


a.ii 


32(01) = 032(3)00 

32(02) = 032(1)11 
32(31) = 132(2)00 
32(32) = 132(0)11 






a.iii 


3i(01) = 003i(3)l 
3i(02) = 003i(l)0 
3i(31) = ll3i(2)l 
Si (32) = ll3i(0)0 






a.iv.B 


32(d) = 0132(0)0 
32(32) = 1032(3)1 


132(01)11 = 32(302) 
032(32)00 = 32(031) 




a.iv.C 


3i(02) = 03i(2)10 
3i(31) = l3i(l)01 


ll3i(02)l = 3i(321) 
003i(31)0 = 3i(012) 





Table 1: Forbidden Patterns in the Proof of Lemma IT71 
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Case 





(ah) 


- 


-- tgi(c)u 


vgi(ab)w 


= 9i(ece') 


gi(kab)x = 
xgi(abk) 


ygi(ece') or 
= 9i(ece')y 


a.iv.D 


9i 


;io) 


— 


1^(2)01 


OO^i (02)0 


= 0i(O22) 




- 




9i 


10) 




1^(2)01 


1O0i(O2)O 


= 0i(122) 




- 




9i 


;o2) 


— 


0^(2)10 


Ol0i(O2)l 


= 0i(221) 




- 




9i 


;23) 


— 


0^(1)10 


Ol0i(23)l 


= 0i (211) 




- 




9i 


;23) 


— 


0^(1)10 


H0i(23)l 


= 01 (311) 




- 




9i 


;3i) 




1^(1)01 


lO0i(31)O 


= 0i(H2) 




- 




92 


;oi) 




1^(2)01 


102(01)10 


= 02(300) 




- 




92 


13) 


— 


1<72(2)01 


0^(13)00 


= 02(001) 




- 




92 


13) 


= 


0# 2 (2)10 


002(13)01 


= 02(003) 




- 




92 


J20) 


= 


002(1)10 


10 2 (2O)1O 


= 02(330) 




- 




92 


J20) 




0^(1)10 


l0 2 (2O)ll 


= 02(332) 




- 




92 


;32) 




1002(3)1 


0^(32)01 


= 2 (O33) 




- 


a.iv.E 


9i 


12) 


— 


10^(1)0 


O0i(12)lO 


= 0i(212) 




- 




9i 


12) 


— 


10^(1)0 


l0i(12)lO 


= 0i(312) 




- 




9i 


12) 


= 


l0i(2)lO 


005i(12)l 


= 01 (021) 




- 




9i 


12) 


— 


1^(2)10 


105i(12)l 


= 0i(121) 




- 




9i 


[21) 


— 


051(1)01 


0l5i(21)0 


= 0i(212) 




- 




9i 


[21) 


— 


0<7i(l)01 


ll5i(21)0 


= 01 (312) 




- 




9i 


[21) 




Ol0i (2)1 


O0i(21)Ol 


= 0i(O21) 




- 




9i 


[21) 


= 


Ol0i (2)1 


l0i(21)Ol 


= 0i(121) 




- 




92 


[03) 


= 


0102(0)1 


l5 2 (03)00 


= 2 (3O1) 




- 




92 


;o3) 




0102(0)1 


l0 2 (O3)Ol 


= 2 (3O3) 




- 




92 


;o3) 




O0 2 (3)O1 


0152(03)0 


= 02(030) 




- 




92 


;o3) 


— 


O0 2 (3)O1 


0152(03)1 


= 02(032) 




- 




92 


;3o) 


— 


102(0)10 


1052(30)0 


= 02(301) 




- 




92 


;3o) 


— 


102(0)10 


1052(30)1 


= 52(303) 




- 




92 


;3o) 


— 


1O0 2 (3)O 


052(30)10 


= 2 (O3O) 




- 




92 


;3o) 


— 


1O0 2 (3)O 


002(30)11 


= 2 (O32) 




- 


a.iv.F 


92 


;o3) 


— 


O0 2 (3)O1 


1052(03)0 


= 02(130) 


52(003)0 


= 052(130) 




92 


;o3) 


— 


O0 2 (3)O1 


1O0 2 (O3)1 


= 02(132) 


02(003)1 


= 052(132) 




92 


;3o) 




102(0)10 


0152(30)0 


= 02(201) 


02(330)0 


= l<fe(201) 




92 


;3o) 




102(0)10 


Ol0 2 (3O)l 


= 02(203) 


02(330)1 


= l0 2 (2O3) 


a.iv.G 


9i 


12) 




1001(1)0 


05i(12)01 


= 0i(21O) 


05i(122) 


= 0i(21O)O 




9i 


12) 




1001(1)0 


l0i(12)Ol 


= 0i(31O) 


l5i(122) 


= 5i(310)0 




9i 


[21) 




Ol0i(2)l 


05i(21)10 


= 0i(O23) 


05i(211) 


= 0i(O23)l 




9i 


[21) 




Ol0i (2)1 


l5i(21)10 


= 0i(123) 


l5i(211) 


= 0i(123)l 



Table 1 (continued): Forbidden Patterns in the Proof of Lemma IT71 



(b) The only a, b, c that satisfy 51(a) = st, 51(6) = uv , and 51(c) = sv such that a^c and 
b ^ c are (a, b, c) G {(0, 3, 2), (1, 2, 3), (2, 1, 0), (3, 0, 1)}. But by LemmaES! the infinite 
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word h w (0) contains no subwords of the form 3a2a0, 2a3al, Ia0a2, or 0ala3. This 
contradicts the assumption that bacaa is a subword of /i^(0) for some a 6 EJ. The 
same result holds true for g 2 . 



Lemma 18 Neither (7i(/i aJ (0)) nor #2(^(0)) contain squares yy with \y\ > 4. 
Proof. As in the case of Lemma this follows from Lemma El ■ 

We can now complete the proof of Theoremd Let x := g 2 (h w (0)) = 010100011101010011 ■ ■ • 
andy := gi(h"(0)) = 001101010110001010 ••• . Then by Lemma d we have x HI y = / w (0). 
But f u (0) = / w (001) and so f u (0) begins with / n (0)/ n (0) for all n > 0. Hence f"(0) begins 
with an arbitrarily large square. 

On the other hand, by Lemma we have that x and y avoid all squares ww with 
\w\ > 4. ■ 
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